External word segmentation of off-line handwritten text lines

نویسندگان

  • Giovanni Seni
  • Edward Cohen
چکیده

This paper describes techniques to separate a line of unconstrained (written in a natural manner) handwritten text into words. When the writing style is unconstrained, recognition of individual components may be unreliable so they must be grouped together into word hypotheses, before recognition algorithms (which may require dictionaries) can be used. Our system uses original algorithms to determine distances between components in a text line and to detect punctuation. The algorithms are tested on nearly 3000 handwritten text lines extracted from postal address blocks. We give a detailed performance analysis of the complete system and its components.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Survey on Word Segmentation Method for Handwritten Documents

One of the most important and challenging tasks in a handwritten recognition pipeline is the segmentation of handwritten document images into text lines and words. Several problems inherent in handwritten documents such as the difference in the skew angle between text lines or along the same text line, the existence of adjacent text lines or words touching, the existence of characters with diff...

متن کامل

Automatic Segmentation of the IAM Off-line Database for Handwritten English Text

This paper presents an automatic segmentation scheme for cursive handwritten text lines using the transcriptions of the text lines and a hidden Markov model (HMM) based recognition system. The segmentation scheme has been developed and tested on the IAM database that contains offline images of cursively handwritten English text. The original version of this database contains ground truth for co...

متن کامل

Text line and word segmentation of handwritten documents

In this paper, we present a segmentation methodology of handwritten documents in their distinct entities, namely, text lines and words. Text line segmentation is achieved by applying Hough transform on a subset of the document image connected components. A post-processing step includes the correction of possible false alarms, the detection of text lines that Hough transform failed to create and...

متن کامل

Segmentation of Handwritten Gurmukhi Text into Lines

Text line segmentation is an essential pre-processing stage for handwriting recognition in many Optical Character Recognition (OCR) systems. It is an important step because inaccurately segmented text lines will cause errors in the recognition stage. Text line segmentation of the handwritten documents is still one of the most complicated problems in developing a reliable OCR. The nature of hand...

متن کامل

Review: A Literature Survey on Text Segmentation in Handwritten Punjabi Documents

Gurumukhi script is used for Punjabi language, which is a two dimensional composition of symbols with connected and disconnected diacritics. Handwritten Gurumukhi script has some complexities like connected, overlapped text lines, words and characters. It is one of the foremost issues for errors during the recognition process. Text segmentation is a challenging job in unconstrained writer indep...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pattern Recognition

دوره 27  شماره 

صفحات  -

تاریخ انتشار 1994